Individuals of varying (and possibly suspect) taste were asked to recommend and rank the watchability of movies from 1 (hard pass) to 10 (must watch now). Also I’m bored as shit, so let’s do this…
imdb_df <-
movie_v %>%
stringr::str_remove("\\s\\(\\d*\\)") %>%
map_dfr(imdbMovies) %>%
as_tibble()
imdb_df %>%
mutate(
imdbVotes = parse_number(imdbVotes),
Metascore = parse_number(Metascore),
imdbRating = parse_number(imdbRating)
) %>%
ggplot(
aes(
x = fct_reorder(Title, imdbRating, .desc = TRUE),
y = imdbRating
)
) +
# geom_col()
geom_col_pattern(
aes(
pattern_filename = I(Posterlink)),
pattern_gravity = "North",
pattern = "image",
pattern_scale = 1,
pattern_type = 'fit',
fill = "#69b3a2",
colour = '#69b3a2'
) +
dark_theme_gray(base_size = 16) +
scale_x_discrete(
guide = guide_axis(n.dodge = 2)
) +
labs(x = NULL,
y = "IMDB Rating") +
theme(axis.title = element_text(colour = "#69b3a2"))Now to see how close the QMC agrees…
movie_df %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F)| Timestamp | The Vast of Night (2019) | Soylent Green (1973) | Westworld (1973) | See You Yesterday (2019) | Gattaca (1997) | Contact (1997) |
|---|---|---|---|---|---|---|
| 1/9/2022 14:15:23 | 7 | 7 | 7 | 6 | 10 | 10 |
| 1/9/2022 13:24:00 | 9 | 7 | 8 | 8 | 6 | 1 |
| 1/9/2022 13:26:40 | 5 | 3 | 6 | 5 | 10 | 8 |
| 1/9/2022 14:33:42 | 8 | 3 | 5 | 7 | 7 | 6 |
| 1/9/2022 14:49:44 | 8 | 7 | 5 | 9 | 7 | 6 |
| 1/9/2022 17:36:19 | 9 | 8 | 5 | 7 | 7 | 9 |
Which movies had the highest average score?
| movie | mean |
|---|---|
| Gattaca (1997) | 7.833333 |
| The Vast of Night (2019) | 7.666667 |
| See You Yesterday (2019) | 7.000000 |
| Contact (1997) | 6.666667 |
| Westworld (1973) | 6.000000 |
| Soylent Green (1973) | 5.833333 |
Gattaca (1997) appears to be universally appreciated, while Soylent Green (1973) is clearly foul.
vote <- movie_df %>%
mutate(
across(
where(is.numeric),
~flip_convert(.x)
)
) %>%
as.data.frame() %>%
column_to_rownames(var = "Timestamp") %>%
as.matrix() %>%
create_vote(xtype = 1)
approval <- approval_method(vote, n = 3)
borda <- borda_method(vote)
copeland <- cdc_copeland(vote)
dodgson <- cdc_dodgson(vote, dq_t = "t")
borda_df <- borda$other_info$count_min %>%
data.frame() %>%
rownames_to_column("Movie") %>%
rename(Rank = 2) %>%
arrange(Rank)
copeland_df <- copeland$other_info$copeland_score %>%
data.frame() %>%
rownames_to_column("Movie") %>%
rename(Rank = 2) %>%
arrange(desc(Rank))
dodgson_df <- dodgson$other_info$tideman %>%
data.frame() %>%
rownames_to_column("Movie") %>%
rename(Rank = 2) %>%
arrange(Rank)A voter’s 1st choice gets 1 point, the 2nd choice gets 2 points… The movie with the smallest total score wins. The function does not require voters to assign scores to all Movies, for NAs are automatically assigned the highest (worst) score. Duplicated values (two or more movies share the same score) are also allowed.
Borda Winner: Gattaca (1997)
borda_df %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F)| Movie | Rank |
|---|---|
| Gattaca (1997) | 19 |
| The Vast of Night (2019) | 20 |
| See You Yesterday (2019) | 24 |
| Contact (1997) | 26 |
| Westworld (1973) | 30 |
| Soylent Green (1973) | 31 |
Movies enter into pairwise comparison. If the number of voters who prefer
Movie Ais larger than the number of voters who preferMovie B, thenMovie AwinsMovie B,Movie Agets 1 point, andMovie Bgets -1 point. If the numbers are equal, then both of them gets 0 point. Then, sum up each one’s comparison points. For example,Movie Awins 3 times, loses 1 time, has equal votes with 2 other movies,Movie A’s score is 3 * 1 + (-1) * 1 + 0 * 2 = 2. The movie that gets the most points wins.
Copeland Winner: The Vast of Night (2019)
copeland_df %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F)| Movie | Rank |
|---|---|
| The Vast of Night (2019) | 5 |
| Gattaca (1997) | 2 |
| See You Yesterday (2019) | 1 |
| Contact (1997) | 0 |
| Westworld (1973) | -3 |
| Soylent Green (1973) | -5 |
Suppose the candidates are
Paul Blart: Mall Cop 2,Casablanca,Howard the DuckandCitizen Kane. IfPaul Blart: Mall Cop 2winsCasablancain pairwise comparison or has equal votes withCasablanca, then add 0 toPaul Blart: Mall Cop 2. IfHoward the DuckwinsPaul Blart: Mall Cop 2, then add toPaul Blart: Mall Cop 2adv(Howard the Duck,Paul Blart: Mall Cop 2), that is, the number of voters that preferHoward the DuckthanPaul Blart: Mall Cop 2, minus the number of voters that preferPaul Blart: Mall Cop 2thanHoward the Duck. Again, ifCitizen KanewinsPaul Blart: Mall Cop 2, then add toPaul Blart: Mall Cop 2that number. Then, we sum up the points that belong toPaul Blart: Mall Cop 2. We do the same thing toCasablanca,Howard the DuckandCitizen Kane. The one gets the least points is the winner.
Dodgson Winner: The Vast of Night (2019)
dodgson_df %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F)| Movie | Rank |
|---|---|
| The Vast of Night (2019) | 0 |
| Gattaca (1997) | 2 |
| See You Yesterday (2019) | 3 |
| Contact (1997) | 4 |
| Soylent Green (1973) | 11 |
| Westworld (1973) | 12 |
movie_long_df %>%
mutate(movie = factor(movie, levels = rev(movie_long_mean$movie))) %>%
ggplot(
aes(
x = movie,
y = vote,
color = movie,
fill = movie
)
) +
geom_boxplot(alpha = 0.3) +
geom_jitter(width = 0.1,
height = 0.1) +
scale_color_viridis_d() +
scale_fill_viridis_d() +
dark_theme_gray(base_size = 16) +
labs(x = "Movie",
y = "Rank") +
coord_flip() +
guides(fill = "none",
color = "none")movie_long_df %>%
mutate(movie = factor(movie, levels = rev(movie_long_mean$movie))) %>%
ggplot(aes(y = movie, x = vote,
color = movie,
fill = movie)) +
geom_density_ridges(bandwidth = 0.5, alpha = 0.2) +
theme_ridges() +
dark_theme_gray(base_size = 16) +
labs(x = "Rank",
y = "Movie") +
guides(fill = "none",
color = "none") +
scale_color_viridis_d() +
scale_fill_viridis_d() +
scale_x_continuous(
expand = c(0.01, 0),
breaks = 1:length(unique(movie_long_df$movie))
)What about the highest variance?
| movie | stan_dev |
|---|---|
| Contact (1997) | 3.204164 |
| Soylent Green (1973) | 2.228602 |
| Gattaca (1997) | 1.722401 |
| The Vast of Night (2019) | 1.505545 |
| See You Yesterday (2019) | 1.414214 |
| Westworld (1973) | 1.264911 |
Who are the film critics, whom all others follow? Who are the freaks?
We define 2 measures of watching preference, mean absolute difference and the sum of squared differences.
We take the average distance of each cinema-goer’s rank for a given movie from that movie’s sample mean.
| Timestamp | mean_abs_diff |
|---|---|
| 1/9/2022 14:33:42 | 0.9444444 |
| 1/9/2022 14:49:44 | 1.0000000 |
| 1/9/2022 17:36:19 | 1.2777778 |
| 1/9/2022 14:15:23 | 1.5555556 |
| 1/9/2022 13:26:40 | 1.8333333 |
| 1/9/2022 13:24:00 | 2.1666667 |
On down the list, you have the lesser movie critics, with indisputably irrational preferences. Timestamps are provided to keep responses anonymous, but you know who you are…
According to some, the sum of squared differences is a demonstrably poorer metric than mean absolute deviation, but I indulge the reader.
| Timestamp | sum_sqrd_diff |
|---|---|
| 1/9/2022 14:49:44 | 7.611111 |
| 1/9/2022 14:33:42 | 10.277778 |
| 1/9/2022 17:36:19 | 13.611111 |
| 1/9/2022 14:15:23 | 19.611111 |
| 1/9/2022 13:26:40 | 25.611111 |
| 1/9/2022 13:24:00 | 43.611111 |